Skip to content

Conversation

@ampandey-AMD
Copy link

@ampandey-AMD ampandey-AMD commented Nov 11, 2025

Summary:

  • Track runtime shutdown via AmdgpuMemFuncs::IsAmdgpuRuntimeShutdown()
    and gate further hsa_amd_pointer_info calls.
  • Add interception of 'hsa_init' api call.
  • Add registration of hsa-runtime associated system events via
    'hsa_amd_register_system_event_handler'.
    • HSA event registered is 'HSA_AMD_SYSTEM_SHUTDOWN_EVENT'

@z1-cciauto
Copy link
Collaborator

@z1-cciauto
Copy link
Collaborator

@ampandey-AMD ampandey-AMD force-pushed the fix-swdev-519413-mi300 branch from 84f2839 to 4e6a327 Compare November 24, 2025 06:51
@z1-cciauto
Copy link
Collaborator

@ampandey-AMD ampandey-AMD force-pushed the fix-swdev-519413-mi300 branch from 4e6a327 to 02dccdb Compare November 24, 2025 06:56
@z1-cciauto
Copy link
Collaborator

@ampandey-AMD ampandey-AMD force-pushed the fix-swdev-519413-mi300 branch 2 times, most recently from fc9cd0d to b78dbb5 Compare November 24, 2025 07:03
@z1-cciauto
Copy link
Collaborator

@z1-cciauto
Copy link
Collaborator

@z1-cciauto
Copy link
Collaborator

@z1-cciauto
Copy link
Collaborator

@ampandey-AMD ampandey-AMD force-pushed the fix-swdev-519413-mi300 branch from d41eca8 to 78e3a56 Compare November 27, 2025 14:03
@z1-cciauto
Copy link
Collaborator

@ampandey-AMD ampandey-AMD force-pushed the fix-swdev-519413-mi300 branch from 78e3a56 to e8ae56c Compare December 2, 2025 08:51
@z1-cciauto
Copy link
Collaborator

@ampandey-AMD ampandey-AMD force-pushed the fix-swdev-519413-mi300 branch from e8ae56c to 894289b Compare December 2, 2025 09:20
@z1-cciauto
Copy link
Collaborator

@ampandey-AMD ampandey-AMD force-pushed the fix-swdev-519413-mi300 branch from 894289b to d9c4492 Compare December 3, 2025 05:43
@z1-cciauto
Copy link
Collaborator

Summary:
  - Track runtime shutdown via AmdgpuMemFuncs::IsAmdgpuRuntimeShutdown()
    and gate further hsa_amd_pointer_info calls.
  - Add interception of 'hsa_init' api call.
  - Add registration of hsa-runtime associated system events via
    'hsa_amd_register_system_event_handler'.
    - HSA event registered is 'HSA_AMD_SYSTEM_SHUTDOWN_EVENT'
Summary:
  - Added wrapper IsAmdgpuRuntimeShutdown for checking state.
  - Added wrapper NotifyAmdgpuRuntimeShutdown for updating state.
  - Add verbose logging logic.
  - Refactor AmdgpuMemFuncs::Init function.
  - Add CAS check for AMDGPU shutdown callback registering.
@ampandey-AMD ampandey-AMD force-pushed the fix-swdev-519413-mi300 branch from afdf142 to 5fefd6d Compare December 3, 2025 05:45
@z1-cciauto
Copy link
Collaborator

@ampandey-AMD ampandey-AMD force-pushed the fix-swdev-519413-mi300 branch from 993355c to 5fefd6d Compare December 3, 2025 05:47
@z1-cciauto
Copy link
Collaborator

@z1-cciauto
Copy link
Collaborator

void *AmdgpuMemFuncs::Allocate(uptr size, uptr alignment,
DeviceAllocationInfo *da_info) {
// Do not allocate if AMDGPU runtime is shutdown
if (IsAmdgpuRuntimeShutdown()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use UNLIKELY here?


void AmdgpuMemFuncs::Deallocate(void *p) {
// Deallocate does nothing after AMDGPU runtime shutdown
if (IsAmdgpuRuntimeShutdown()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use UNLIKELY here?


bool AmdgpuMemFuncs::GetPointerInfo(uptr ptr, DevicePointerInfo* ptr_info) {
// GetPointerInfo returns false after AMDGPU runtime shutdown
if (IsAmdgpuRuntimeShutdown()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use UNLIKELY here?

CHECK_LT(idx, n_chunks_);
h = GetHeader(chunks_[idx], &header);
CHECK(!dev_runtime_unloaded_);
if (dev_runtime_unloaded_)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use UNLIKELY here?

for (uptr i = 0; i < n_chunks_; i++) {
Header *h = GetHeader(chunks_[i], &header);
CHECK(!dev_runtime_unloaded_);
if (dev_runtime_unloaded_)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use UNLIKELY here?

// Device allocator has dependency on device runtime. If device runtime
// is unloaded, GetPointerInfo() will fail. For such case, we can still
// return a valid value for map_beg, map_size will be limited to one page
if (!dev_runtime_unloaded_) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we instead make an UNLIKELY check for unloaded here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants